Estimating the time-varying reproduction number for COVID-19 in South Africa during the first four waves using multiple measures of incidence for public and private sectors across four waves

Objectives The aim of this study was to quantify transmission trends in South Africa during the first four waves of the COVID-19 pandemic using estimates of the time-varying reproduction number (R) and to compare the robustness of R estimates based on three different data sources, and using data from public and private sector service providers. Methods R was estimated from March 2020 through April 2022, nationally and by province, based on time series of rt-PCR-confirmed cases, hospitalisations, and hospital-associated deaths, using a method that models daily incidence as a weighted sum of past incidence, as implemented in the R package EpiEstim. R was also estimated separately using public and private sector data. Results Nationally, the maximum case-based R following the introduction of lockdown measures was 1.55 (CI: 1.43–1.66), 1.56 (CI: 1.47–1.64), 1.46 (CI: 1.38–1.53) and 3.33 (CI: 2.84–3.97) during the first (Wuhan-Hu), second (Beta), third (Delta), and fourth (Omicron) waves, respectively. Estimates based on the three data sources (cases, hospitalisations, deaths) were generally similar during the first three waves, but higher during the fourth wave for case-based estimates. Public and private sector R estimates were generally similar except during the initial lockdowns and in case-based estimates during the fourth wave. Conclusion Agreement between R estimates using different data sources during the first three waves suggests that data from any of these sources could be used in the early stages of a future pandemic. The high R estimates for Omicron relative to earlier waves are interesting given a high level of exposure pre-Omicron. The agreement between public and private sector R estimates highlights that clients of the public and private sectors did not experience two separate epidemics, except perhaps to a limited extent during the strictest lockdowns in the first wave.


Introduction
As of April 2022, South Africa had experienced four waves of COVID-19, following the first reported case in early March 2020.In March 2020, while case numbers were still low, the South African government declared a national state of emergency and introduced strict lockdown legislation, with five lockdown levels [1][2][3].Lockdown started at level five, which forced the temporary closure of all non-essential businesses, the closure of international and interprovincial borders, an alcohol and tobacco prohibition, and a ban on leaving one's home except to access essential services.Lockdown restrictions relaxed, due to economic pressure, early during the onset of the first wave, and were reintroduced in adjusted forms during the second and third waves.The first wave was associated with primarily wild-type SARS-CoV-2, the second wave with the Beta variant (B.1.351), the third wave with the Delta variant (B.1.617.2), and the fourth wave with the Omicron variant (B.1.1.529.1)[4].During July 2021, civil unrest caused disruptions to laboratory and healthcare services in KwaZulu-Natal and Gauteng provinces.
In South Africa, roughly 17% of residents have access to private healthcare through health insurance, whereas the remaining 83% rely primarily on the public healthcare system [5].Clients of the private sector tend to have a higher mean income and live in areas with lower population density [6,7]; while people with lower socioeconomic status are likely to be at a higher risk of SARS-CoV-2 infection [7][8][9][10].As such, transmission patterns may be expected to differ between clients of the private versus public sector healthcare providers, due to the difficulty of adhering to lockdown measures in higher density areas [11].
The time-varying reproduction number R is the expected number of secondary cases caused by a single infected individual at a given point in time, assuming that conditions remain constant for the duration of the infectious period.R reflects inherent pathogen properties combined with environmental and social conditions such as population immunity, non-pharmaceutical interventions (NPIs), perceptions of disease, and access to healthcare.Reproduction number estimates are used to track transmission trends, assess the impacts of interventions, and parameterise epidemic models [12,13].
R estimates are typically based on time series data of cases or deaths, although any measure representing an approximately constant proportion of total incidence may be used, in conjunction with data from which to estimate the generation interval [14].For example, a time are also supported by the Department of Science and Innovation and the National Research Foundation (NRF).Any opinion, finding, and conclusion or recommendation expressed in this material is that of the authors and the NRF does not accept any liability in this regard.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.series of deaths are sometimes used for reproduction number estimation, particularly in situations where the consistency of testing systems is called into question, as deaths are thought to be more systematically assessed and recorded than other measures of incidence.

Competing interests
We generated R estimates for COVID-19 in South Africa during the first four waves of the epidemic, nationally and by province.We also sought to examine whether R estimates differed according to data source (comparing the daily incidence of laboratory-confirmed COVID-19 cases, hospitalisations, and in hospital deaths), and to compare transmission patterns between the public and private sectors.

Data
Our study utilised two primary data sets maintained by the South African NICD.Data on laboratory-confirmed cases were obtained from the national Notifiable Medical Conditions Surveillance System (NMC-SS) line list, to which all pathology laboratories are legally required to report positive COVID-19 test results [24] and included both primary infections and suspected reinfections.Case report dates ranged from March 5 th , 2020, when the first case was reported, through April 28 th , 2022.Positive tests were classified as associated with a reinfection if more than 90 days had elapsed since the most recent positive test for the same patient [15].Data on laboratory-confirmed cases were filtered to include only cases confirmed via reverse-transcription polymerase chain reaction (rt-PCR) testing, due to substantial numbers of incorrectly entered reference dates for antigen tests.Data on hospital admissions and hospital-associated deaths were obtained from the national DATCOV dataset, to which all private (262) and public (407) hospitals report confirmed COVID-19-positive admissions and deaths in hospital [16].The generation interval distribution was approximated using a gamma distribution fit to data from PHIRST-C, a community cohort study of COVID-19 transmission (mean = 6.63 days, si mean = 0.51 days; si = 3.28 days, si si = 0.27 days; see R estimation subsection below) [17].Dates of symptom onset were available for 55% of hospitalised cases and 57% of in-hospital deaths.31 cases (0.007%) in the DATCOV database were missing both admission date and date of symptom onset and were excluded from the analyses based on admissions and deaths.The reinfections line list was linked with DATCOV to obtain dates of symptom onset for 5.6% of rt-PCR-confirmed cases (obtained through the DATCOV database).

Imputation
Missing onset dates in the source datasets were imputed using the multiple imputation chained equations technique, as implemented in the R packages mice [18] and countimp [19].The imputation procedure recursively estimates individual-level delays between symptom onset and hospital admission (for hospitalisations and deaths), or between symptom onset and date of reported case confirmation (for rt-PCR-confirmed cases).In each estimation step, a multivariate negative binomial model was fitted to predict delays based on 6 (lab-confirmed cases) or 7 (hospitalisations and deaths) other variables and used to predict delay values that were incurred but not reported.Poisson distributed generalised linear models were fitted to explore the relationships between the known delay values and individual predictor variables, in order to select the most relevant predictors.The variables used for imputations in the two line lists were: health sector where laboratory testing/hospital admission occurred (public or private), age group (in ten-year intervals), month of case report/hospital admission, case outcome (for admissions), day of hospital admission (for admissions), province (for rt-PCR-confirmed cases), and district (for admissions).

Adjustment for right-censoring
Two mechanisms exist by which the data are right-censored.The first occurs because some case confirmations, hospital admissions, and in hospital deaths correspond to dates of symptom onset that fall within the date range of the data, despite the events (case confirmation, hospital admission, or death) occurring outside the date range of the data.The second cause of right-censoring in the data comes from case confirmations, hospital admissions, and in-hospital deaths that occur within the date range of the data, but which are not yet included in the dataset.
The first source of right-censoring was accounted for by inflating the end of the time series according to the distributions of delays from symptom onset to reporting of test results, hospital admission, and death [14].Specifically, counts for each day were divided by the proportion of the appropriate delay distribution with delay larger than the difference between the day in question and the last date for which test results, hospital admissions, or deaths were reported.The second source of right-censoring was more difficult to adjust for rigorously, and was mitigated by truncating the last 3, 7, and 7 days (respectively) of the time series.

R estimation
Daily time series of rt-PCR-confirmed COVID-19 cases, hospitalizations, and deaths, by dates of symptom onset, were computed using the imputed line lists and adjusted for right-censoring.R was estimated using the method described by Thompson et al. and implemented in the R package EpiEstim [14], identified in a recent review of R estimation techniques as a suitable method given the data available during this study [20].The method assumes that current incidence results from the combined transmissibility of recently infected individuals, and can thus be calculated based on recent incidence values [12,14].In general terms, the number of new infections coming from individuals infected n days ago is assumed to be proportional to the product of the number of individuals infected n days ago, and the proportion of transmission events that occur between n and n + 1 days following infection.Specifically, the incidence I t (in time step t) is assumed to be Poisson distributed, with mean where R is the time-varying (instantaneous) reproduction number at time step t, and w s is the relative infectiousness of an individual s time steps following infection, assumed to be constant with respect to t.Furthermore, R is assumed to be constant over some time window τ, the choice of which involves a tradeoff between the width of the resulting credible intervals and the sensitivity of the method to rapid changes in R. We estimated R using 7, 14, and 21 day sliding windows; estimates using 7-day sliding windows are presented in the main text.The likelihoods of hypothetical values for R are calculated, resulting in a gamma-distributed Bayesian posterior for R. The infectiousness profile w s was estimated using a gamma-distributed generation interval fit to data from the PHIRST-C community cohort study [17] (μ SI = 6.63 days, σ μ-SI = 0.51 days; σ SI = 3.28 days, σ σ-SI = 0.27 days).In order to accommodate uncertainty in w s , n 1 pairs of (μ SI , σ SI ) are sampled from truncated normal distributions (with σ SI < μ SI ).From each of these n 1 pairs, n 2 posterior R estimates are generated, resulting in n 1 x n 2 R estimates, per imputation, for each time window.We generated R estimates using n 1 = n 2 = 25, resulting in 625 R estimates per time window per imputation.Several parameter combinations were explored to determine suitable values for n 1 and n 2 (see section 5 in S1 Appendix).No explicit smoothing of the time series or R estimates was performed.The R estimation procedure was performed on each of 25 imputed datasets.We present median values, between imputations, of the median R estimates and 95% credible intervals (CI) arising from the 625 individual R estimates per estimation window.
R was estimated based on rt-PCR-confirmed cases (R cases ), hospital admissions (R admissions ) and hospital-associated deaths (R deaths ).R was also estimated separately using public and private sector data; rt-PCR-confirmed cases were classified according to the sector of the laboratory where samples were processed, whereas hospitalised cases and in-hospital deaths were classified according to the sector of the hospital where admission took place.This distinction is relevant because some patients who were admitted to public hospitals paid out of pocket to access testing services from private laboratories.Laboratory and healthcare service providers were disrupted during a period of civil unrest in KwaZulu-Natal and Gauteng provinces.We indicate the period of unrest, between 10 and 19 July 2021, and provide a simplistic indicator of its effects on R estimates by adding a shaded area to the graph, with opacity equal to the proportion of the backwards-looking generation time which occurs during or before the unrest, as this is the weight applied to past incidence values when estimating R.

National and provincial R estimates using time series of cases
Nationally, R based on cases (R cases ) dropped sharply following the closure of borders and schools in mid-March 2020 (Fig 1), followed by an increase in mid-April.During levels five and four lockdown, R cases fluctuated around 1.25, with large credible intervals (Table 1 and Fig 1).R cases remained steady through the level three lockdown in June 2020, with values between 1 and 1.5, then began to decrease in late June; the maximum R cases in the first wave (after April 1 st 2020) was 1.55 (CI: 1.43-1.66).R cases crossed below 1 in mid-July and continued to decrease through early August.R cases increased gradually during level two lockdown and through most of level one lockdown, then began to decrease in the first half of December while incidence in the second wave was still increasing; maximum R cases in the second wave was 1.56 (CI: 1.47-1.64).The gradual decrease continued until the first week of January 2021, approximately one week after the introduction of adjusted level three lockdown, when R cases declined sharply, reaching a value of 0.56 (CI: 0.53-0.60)by the end of January.R cases then began to increase, in a pattern similar to that following the first wave, through May 2021, crossing above 1 in early April and peaking in mid-June; maximum R cases in the third wave was 1.46 (CI: 1.38-1.53).R cases dipped sharply from mid-June through late July, then climbed to a smaller peak in midto-late August.R cases decreased through late August and September 2021, then remained approximately constant from late September through October 2021.Starting in early November 2021, R cases increased rapidly throughout November 2021, reaching a maximum value of 3.33 (CI 2.83-3.97),and then decreased rapidly through the end of December.R cases increased slightly during January 2022, then remained constant during February and March.R cases increased quickly in early to mid April 2022 (Fig 1).
Trends in R cases between provinces were generally similar, although several provinces (Northern Cape, North West, and Free State) exhibited extended first waves.Estimates for  While the timing of transmission trends varied substantially between provinces during the first wave, peak R cases values during the second (Beta-dominated) wave occurred at similar times in most provinces, except in the Eastern Cape, where the second wave started approximately four weeks earlier than in other provinces; in addition, Limpopo, Mpumalanga, Northern Cape, and Free State experienced peak transmission slightly later than the more-densely populated provinces of Western Cape, Gauteng, and KwaZulu-Natal.
The timing and shape of the third (Delta-dominated) wave was more varied between provinces than the first two waves (Fig 2

Comparing R estimates based on different data endpoints
Trends in R estimates based on cases (R cases ), hospitalisations (R admissions ), and in hospital deaths (R deaths ) were generally similar during the first three waves, but R cases diverged from R admissions and R deaths during the fourth wave, with the peak R cases being higher than that from deaths and admissions.Throughout the epidemic, estimates based on deaths (and to a lesser extent admissions) were generally less stable and had wider credible intervals than estimates based on cases (see sections 2, 3 in S1 Appendix).Ratios between the three endpoints varied considerably over the course of the epidemic (see section 6 in S1 Appendix).
Shortly after the peak of the first wave, in July 2020, R cases dropped below R admissions and R deaths ; this occurred in several provinces as well as nationally.During the first wave, in mid-May 2020, R deaths was higher than R cases , which was in turn higher than R admissions .In November 2020, when R cases and R admissions dipped, as well as in late December 2020 and early January 2021, R deaths was higher than R cases and R admissions .In six out of nine provinces, R cases exceeded R admissions and R deaths during parts of June and July 2021.
In Gauteng, R admissions had lower maxima in both waves than R cases or R deaths ; R cases had similar maxima to R deaths but maintained these values for shorter periods.R cases reached lower post-wave minima than R admissions or R deaths .In KwaZulu-Natal, R admissions changed less drastically during the first two waves, with lower maxima and higher minima, than R cases or R deaths .Civil unrest in Gauteng and KwaZulu-Natal during July 2021 caused substantial disruptions to laboratory and healthcare services, with corresponding dips in R estimates followed by compensatory increases.All three endpoints were affected, although R cases was the most affected.
During the fourth wave in late 2021 and early 2022, R cases rose faster and higher than R admis- sions or R deaths .R estimates based on the three data endpoints peaked within two days of one another in early December 2021, after which R cases dropped more rapidly than R admissions , which in turn dropped faster than R deaths .Estimates based on the three endpoints converged again in mid-February 2022.

Public versus private sector
Transmission patterns between clients of the public and private sectors were generally similar.Public and private sector R estimates differed most during the initial level five and four lockdowns, and in R cases leading up to the peak of the fourth wave.At the end of the third wave in June/July 2021, and to a lesser extent at the end of the second wave, private-sector R dropped more rapidly than public-sector R, so that private-sector R was lower than public sector R (Fig 3).
This difference appears in estimates using all three data endpoints for the national-level analysis, as well as in Gauteng, Free State, and KwaZulu-Natal, and appeared in estimates based on cases in all provinces except the Western Cape (see section 3 in S1 Appendix).During the fourth wave, R cases in the private sector rose above R cases in the public sector.

Overview
The initial level five lockdown in early 2020 had a substantial impact on transmission but was insufficient to bring R below one (Table 1).It is difficult to disentangle the effects of subsequent lockdowns from those of increasing population immunity and other drivers of behavioural change.Even given high levels of preceding population immunity [21,22], the estimated peak R for the Omicron wave was substantially higher than previous waves.Average R values were lower in lower lockdown levels, probably because lockdown levels were increased in response to increasing transmission and lowered when transmission was deemed to be under control.R estimates based on the three data endpoints were similar overall, although in practice hospitalisations and deaths provided slightly less timeous estimates.During our regular public reporting of R estimates [23], data updates on hospitalised cases and deaths were typically delayed by one to two weeks.Combined with longer truncation of hospitalised cases and deaths to account for late arriving data, this meant that estimates of R cases were two to three weeks ahead of those for R admissions , and three weeks ahead of those for R deaths .R followed generally similar patterns by province, with notable exceptions including extended waves in some provinces, an early second wave in the Eastern Cape, and unrest-related changes to R estimates in Gauteng and KwaZulu-Natal during July and August 2021.R estimates in the private and public sectors were similar prior to the fourth wave, except during the level five and four lockdowns of early 2020, but diverged during the fourth wave in late 2021 and early 2022.
Several studies featuring R estimates for South Africa have been published since the start of the COVID-19 pandemic.However, aside from the regular reports we released via the National Institute for Communicable Diseases (NICD) [23], the studies we identified relied on publicly reported time series data, which do not include symptom onset dates and which may be affected by backfilling and additional reporting delays, and used international estimates for the generation interval [1,2,[24][25][26][27][28][29][30]; most studies also use a single measure of incidence and do not cover all of the first four waves.While currently available evidence suggests that clients of the public sector experienced higher levels of transmission during the first two waves [7,31], we did not identify any studies comparing reproduction number estimates in different income groups, or between healthcare sectors.
Our national-level R estimates are consistent with estimates from two other South African studies using comparable methods [27,32].McCarthy et al. estimated R for the period before March 18 th 2020 (when the first movement restrictions were enacted) at 4.15 (CI: 3.60-4.74),using a generation interval estimate of 5.7 +-2.7 days, and ignoring importation status [32]; we estimated an R of 3.80 (CI: 3.03-4.92)for the same period.Roussouw obtained R trajectories consistent with ours, with R crossing 1 at approximately the same time as using our estimates, and peaking at similar values [27].
Provinces with drawn-out first waves (Northern Cape, North West, and Free State) are the three provinces with the lowest population density and smallest populations, possibly reflecting discontinuous epidemics in more isolated populations [33].
Events in some provinces, such as the civil unrest in KwaZulu-Natal and Gauteng provinces during July 2021, were associated with sudden drops in R estimates, followed by compensatory increases (see Figs 1-3).In general, short-term decreases in incidence observation processes (such as disruptions to testing facilities or decreases in healthcare-seeking behaviour) led to transitory dips in R estimates, followed by compensatory increases.This pattern of decline and increase is typical during public holidays, such as the Easter weekend in 2021, which coincided with a slight drop in both R admissions and R cases across most provinces (Fig 2).In KwaZulu-Natal, R estimates based on admissions (R admissions ) spiked following a nosocomial outbreak at a private hospital in early April [34] (Fig 2).Furthermorea public hospital in Durban, Kwa-Zulu-Natal, reported a sudden surge of approximately100 admissions on May 26 th 2020 -this led to a spike, and subsequent dip, in R estimates for KZN.All of these factors highlight the fact that case-based estimates can be affected by changes in testing practice, which could bias R estimates, particularly over the short term.
While this work represents one lens through which to assess the effectiveness of lockdowns, thorough assessment of this important question (how effective were lockdowns?)would require additional sources of data, such as information on adherence to lockdown measures, and alternative modelling techniques beyond the scope of this work.Overall, the modelling approaches used here do not allow us to develop appropriate counter-factuals which would be necessary to examine these types of questions.Similarly, we did not have access to data which would allow us to clearly identify and differentiate between the causes of differences in R estimates based on different data endpoints.

Different data endpoints
R estimation using hospital admissions and deaths may prove valuable in scenarios where laboratory testing is limited or unavailable to the public or if there is changing access to testing throughout the pandemic, for example, policies restricting testing to severe cases during epidemic peaks such as those implemented in the Western Cape [35].Numbers of hospitalisations should be more robust to changes in test availability, although changing hospitalisation admission criteria may coincide with changes in the laboratory testing data; similarly, in-hospital deaths may change relative to incidence of infections due to improvements in treatment practices leading to reduced mortality over time.Overall, the general agreement of R estimates between endpoints prior to the fourth wave is encouraging, and points to the consistency of the imputation procedure.While agreement between endpoints would also suggest similarities in the levels of representativeness of each data endpoint, the ratios of incidence from the three time-series varied substantially over the time period considered (see section 6 in S1 Appendix).
Stricter hospital admission practices near times of peak transmission, and corresponding relaxations in hospital admission practices following wave peaks, could have resulted in decreasing representativeness of hospitalisations (as a measure of incidence) during wave peaks and increasing representativeness following waves; this could explain the less extreme values and slower changes in R admissions relative to R cases .On the other hand, testing-seeking behaviour may shift during waves, with heightened levels of concern during the beginning of waves leading to increases in test seeking and case confirmations.As wave peaks pass, people may be less concerned, leading to a reduction in test-seeking behaviour and corresponding decrease in R estimates based on confirmed cases.Divergence between R cases and R admissions / R deaths during the fourth wave was likely caused by a combination of the above factors with reduced severity of outcomes for people infected with the Omicron variant relative to the previously dominant Delta variant, causing the proportion of underlying infections which resulted in hospital admissions to decrease [36,37].
Were surveillance resources to be scaled back in the future and the reliability of one or two of the data endpoints called into question, our results suggest that the remaining data endpoints could still be used to monitor changes in transmission.However, robust surveillance of all three data endpoints should be maintained in some areas for validation purposes, particularly as increasing decoupling of COVID-19 cases, hospitalisations, and deaths is likely going forward.

Public versus private sector
The similarities between public-and private-sector R estimates are of particular interest in light of several seroprevalence studies which suggest that clients of the public sector experienced higher levels of SARS-CoV-2 transmission [7,31,38].It is also important to note the under-representation of clients of the public sector in all three data endpoints relative to clients of the private sector, as clients of the private sector represent approximately 17% of the South African population, yet only 52% of recorded cases came from public sector testing facilities.
The most straightforward reason for the similarity of public and private sector R estimates is the fact that clients of the public and private sectors do not form two separate and isolated populations with regard to respiratory virus transmission-although there may have been less mixing during the initial highly-restrictive lockdown levels five and four, leading to differing R estimates during this period.In addition, several biases could have made it more difficult to detect differences in transmission: for example, healthcare-seeking behaviour driven by heightened concern during periods of high incidence may have increased the proportion of infections that appeared in private sector data, inflating private sector R estimates near peaks in R.This explanation is consistent with the higher R cases based on private sector data during the fourth wave.Capacity limitations of public sector healthcare providers and laboratory services (particularly during the first wave -see section 7 in S1 Appendix -when public sector testing experienced substantial processing delays and backlogs) may also have led to decreasing representativeness of public sector data near peaks in case and hospital admission numbers.
Unbiased R estimation requires that measures of disease incidence represent constant proportions of the true incidence of infections.In addition, the generation interval may have shifted over time with the appearance of new variants and changes in disease-related behaviours.Changes in laboratory capacity, including availability of test kits and reagents, levels of population immunity (whether pre-existing, infection-induced, or vaccine-induced), healthcare-seeking behaviour, treatment of COVID-19 disease, circulating SARS-CoV-2 strains, and data collection practices may all bias R estimates.
Furthermore, a number of factors may have altered mortality outcomes over time, including the introduction of dexamethasone treatment in mid-June 2020, the use of oxygen administration via high-flow nasal cannula, changes in the quality of healthcare provided if health systems are overwhelmed, changing levels of vaccine-and infection-induced immunity, and potential differences in severity between initially circulating viruses and the different variants that dominated the second, third, and fourth waves.Combined, these factors may lead to perturbations in the time series data that are unrelated to transmission.Furthermore, the distributions of delays between symptom onset and case report/admission/death may change over time, which would affect the accuracy of adjustments for right-censoring at the end of the time series.Due to the impact of civil unrest, which led to reductions in testing rates in KwaZulu-Natal and Gauteng provinces in July 2021, trends in R estimates during that time for these provinces, as well as nationally, should be interpreted with caution.
The agreement of R estimates based on the three data endpoints prior to the fourth wave, along with the fact that many of the likely biases would affect only one or two of the three endpoints, suggests that either the above biases were relatively small and / or together moved the three data endpoints in similar ways.
Early (pre-lockdown) incidence included a substantial portion of imported cases, and our early R estimates are likely not an accurate reflection of transmission trends; the sharp drop in R estimates following the initial border closures may be attributed in part to the sudden decrease in imported cases (see McCarthy et al. [32] for early R estimates incorporating data on importation status).
Future work could include comparison with other R estimation methods [20,39], methods to correct for holidays and disruptions to surveillance processes (e.g.due to civil unrest), and disentangling the role of immunity from trends in R.

Conclusion
We conducted a robust process of R estimation for COVID-19 in South Africa.The use of high-resolution data allowed us to directly compare estimates based on different data endpoints, since all estimates were based on symptom onset dates, to compare estimates between the public and private sectors, and to exclude antigen testing due to issues with data quality and reporting completeness.
We found that different data endpoints yielded similar R estimates during the first three waves but diverged during the fourth wave, suggesting that, while useful R estimates could potentially be obtained from only one or two data endpoints, particularly if virus severity remained unchanged, decision-makers should where possible consider R estimates using multiple data endpoints.We found similar R estimates using public and private sector data, although clients of the public sector were heavily underrepresented in the data, and private sector R cases was higher than public sector R cases during the fourth wave.
Although R estimates provide limited resolution for understanding the drivers of transmission, the estimates presented here served a valuable role in the South African COVID-19 response efforts by providing routine monitoring of transmission trends.

Fig 1 .
Fig 1. R estimates for each data endpoint, based on national daily time series of rt-PCR-confirmed cases, hospitalisations, and deaths.R estimates for each data endpoint (upper panel), South Africa, based on (lower panel) national daily time series of rt-PCR-confirmed cases, hospitalisations, and deaths.R estimated using 7-day sliding windows, from early March 2020 through 25 April.Results reflect median values (between imputations) of median R estimates and associated 2.5% and 97.5% credible intervals.L = Level.Red-shaded areas indicate the period during which civil unrest caused severe disruptions to surveillance in KwaZulu-Natal and Gauteng provinces; grey-shaded areas indicate gradually diminishing effects on R estimates.https://doi.org/10.1371/journal.pone.0287026.g001

Fig 2 .
Fig 2. Province-level R estimates for each data endpoint from early March 2020 through 25 October 2022.R estimated on 7-day sliding windows.Results reflect median values (between imputations) of median R estimates and associated 2.5% and 97.5% credible intervals.L = Level.Red-shaded areas indicate the period during which civil unrest caused severe disruptions to surveillance in KwaZulu-Natal and Gauteng provinces; grey-shaded areas indicate gradually diminishing effects on R estimates.https://doi.org/10.1371/journal.pone.0287026.g002

Fig 3 .
Fig 3. R estimates by sector, based on rt-PCR-confirmed COVID-19 cases, hospitalisations, and deaths.R estimates by sector, based on rt-PCR-confirmed COVID-19 cases (upper panel), hospitalisations (middle panel), and deaths (lower panel), South Africa.R estimates were generated using 7-day sliding windows.Results reflect median values (between imputations) of median R estimates and associated 2.5% and 97.5% credible intervals.L = Level.Red-shaded areas indicate the period during which civil unrest caused severe disruptions to surveillance in KwaZulu-Natal and Gauteng provinces; grey-shaded areas indicate gradually diminishing effects on R estimates.https://doi.org/10.1371/journal.pone.0287026.g003

:
CC has received grant support from Sanofi Pasteur, US CDC, Wellcome Trust, Programme for Applied Technologies in Health (PATH), Bill & Melinda Gates Foundation and South African Medical Research Council (SA-MRC).JRCP has received funding for COVIDrelated work from Bill & Melinda Gates Foundation, WHO AFRO, and Wellcome Trust and serves on the Ministerial Advisory Committee for COVID-19 for the South African National Department of Health.The other authors report no known conflicts of interest.This does not alter our adherence to PLOS ONE policies on sharing data and materials.
This study has received ethical clearance from University of the Witwatersrand Human Research Ethics Committee (clearance certificate no.M210752, formerly M160667) and approval under reciprocal review from Stellenbosch University Health Research Ethics Committee (project ID 19330, ethics reference no.N20/11/074_RECIP_WITS_M160667_COVID-19).All data was received in fully anonymized form.The need to obtain participant consent was waived by the ethics committees.